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Gaussian graphical models are semi-algebraic subsets of the cone 
of positive definite covariance matrices. Submatrices with low rank 
correspond to generalizations of conditional independence constraints 
on collections of random variables. We give a precise graph-theoretic 
characterization of when submatrices of the covariance matrix have 
small rank for a general class of mixed graphs that includes directed 
acyclic and undirected graphs as special cases. Our new trek separa- 
tion criterion generalizes the familiar d-separation criterion. Proofs 
are based on the trek rule, the resulting matrix factorizations, and 
classical theorems of algebraic combinatorics on the expansions of 
determinants of path polynomials. 

1. Introduction. Given a graph G, a graphical model is a family of 
probability distributions that satisfy some conditional independence con- 
straints which are determined by separation criteria in terms of the graph. 
In the case of normal random variables, conditional independence constraints 
correspond to low rank submatrices of the covariance matrix S of a special 
type. Thus for Gaussian graphical models, the graphical separation criteria 
correspond to special submatrices of the covariance matrix having low rank. 

Consider first the case where G is a directed acyclic graph. In this case, a 
conditional independence statement Xa-^-Xb\Xc holds for every distribu- 
tion consistent with the graphical model if and only if C d-separates A from 
B in G. For normal random variables the conditional independence con- 
straint Xa~U-Xb\Xc is equivalent to the condition rank T,auC,buC = #C> 
where T*auC bug is the submatrix of the covariance matrix £ with row in- 
dices AuC and column indices BUC. However, the drop of rank of a general 
submatrix Ha,b does not necessarily correspond to a conditional indepen- 
dence statement that is valid for the graph, and will not, in general, come 
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from a d-separation criterion. Our main result for directed graphical models 
is a new separation criterion (t-separation) which gives a complete charac- 
terization of when submatrices of the covariance matrix will drop rank, and 
what the generic lower rank of that matrix will be. 

One of the main reasons for searching for necessary and sufficient condi- 
tions for matrices to drop rank comes from the search for a unified perspec- 
tive on rank conditions implied by the d-separation criterion and the tetrad 
representation theorem |12| . which characterizes 2x2 vanishing determi- 
nants in directed acyclic graphs. The t-separation criterion unifies both of 
these results under a simple and more general umbrella. 

A second reason for introducing t-separation is that it provides a new 
set of tools for performing constraint based inference in Gaussian graph- 
ical models. This approach was pioneered by the TETRAD program [10] 
where vanishing tetrad constraints are used to infer the structure of hidden 
variable graphical models. The mathematical underpinning of the TETRAD 
program is the above mentioned tetrad representation theorem [12]. In fact, 
the impetus for this project was a desire to develop a better understanding of 
the tetrad representation theorem. The original proof of the tetrad represen- 
tation theorem is lengthy and complicated, and some simplifications appear 
in subsequent work |11| [T3] . Our result has the advantage of being consider- 
ably broader, while our proof is more elementary. The notion that algebraic 
determinantal constraints could be useful for inferring graphical structures 
is further supported by recent results on the distribution of the evaluation 
of determinants of Wishart matrices [1], which would be an essential tool 
for developing Wald-type tests in this setting. 

Section [2] gives the setup of Gaussian graphical models and states the 
main results on t-separation. To describe the main result we need to recall 
the notion of treks, which are special paths in the graph G. These are the 
main objects used in the trek rule, a combinatorial parametrization of co- 
variance matrices that belong to the Gaussian graphical model. We make a 
special distinction between general treks and simple treks and introduce two 
trek rules. These results are probably well-known to experts, but are difficult 
to find in the literature. Then we make precise the t-separation criterion and 
state our main results about it. This section is divided into subsections, stat- 
ing our results first for directed graphical models, then undirected graphical 
models, and finally the more general mixed graphs. The purpose for this 
division is twofold: it extracts the two most common classes of graphical 
models and it mirrors the structure of the proof of the main results. 

Section [3] is concerned with the proofs of the main results. The main idea 
is to exploit the trek rule which expresses covariances as polynomials in 
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terms of treks in the graph G. The expansion of determinants of matrices of 
path polynomials is a classical problem in algebraic combinatorics covered 
by the Gessel-Viennot-Lindstrom Lemma, which we exploit in our proof. 
The final tool is Menger's Theorem on flows in graphs. 

Acknowledgments. We thank Mathias Drton for suggesting this prob- 
lem to us. The referees and associate editor provided many useful comments 
which have lead to this improved version. Jan Draisma, who was originally 
an anonymous referee on this paper, provided the first proof of Theorem 
2.17, which was a conjecture in an earlier version of the paper. 

2. Treks and t-separation. This section provides background on and 
definitions of treks as well as the statements of our main results on t- 
separation for Gaussian graphical models. We describe necessary and suffi- 
cient conditions for directed and undirected graphs first, and then address 
the general case of mixed graphs. The proofs in Section [3] also follow the 
same basic format. 

2.1. Directed Graphs. Let G be a directed acyclic graph with vertex set 
V(G) = [m] := {1,2, ... ,m}. We assume G is topologically ordered, that 
is, we have i < j whenever i — > j £ E{G). A parent of a vertex j is a 
node i G V(G) such that i ^ j is an edge in G. The set of all parents 
of a vertex j is denoted pa(j'). Given such a directed acyclic graph, one 
introduces a family of normal random variables that are related to each 
other by recursive regressions. 

To each node i in the graph, we introduce a random variable Xi and 
a random variable e^. The ej are independent normal random variables 
e.j ~ AA(0, <fii) with <pi > 0. We assume that all our random variables have 
mean zero for simplicity. The recursive regression property of the DAG gives 
an expression for each Xj in terms of €j, those Xi with i < j, and some re- 
gression parameters Xij assigned to the edges i — > j in the graph: 



From this recursive sequence of regressions, one can solve for the co- 
variance matrix X of the jointly normal random vector X. This covariance 
matrix is given by a simple matrix factorization in terms of the regression 
parameters and the variance parameters (pi. Let $ be the diagonal matrix 
$ = diag(</>i, . . . , (fim). Let L be the m x m upper triangular matrix with 
Lij = Xij if i — > j is an edge in G and Lij = otherwise. Set A = I — L 
where I is the m x m identity matrix. 




iepa(j) 
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Proposition 2.1. [9, Section 8] The variance-covariance matrix of the 
normal random variable X = JV(0, S) is given by the matrix factorization: 



£ = A -T *A -1 . 

Given two subsets A,Bc [m], we let T*a,b = (°~ab)aeA,b£B be the sub- 
matrix of covariances with row index set A and column index set B. If 
A = B = [m], we abbreviate and say that £[ m ]j m ] = S. Conditional in- 
dependence statements for normal random variables can be detected by 
investigating the determinants of submatrices of the covariance matrix [13] . 



Proposition 2.2. Let X ~ JV(//, £) be a normal random variable, and 
let A, B, and C be disjoint subsets of[m\. Then the conditional independence 
statement Xa^-Xb\Xq holds for X, if and only if Sauc.buC has rank C . 

Often in the statistical literature, the conditional independence conditions 
of a normal random variable are specified by saying that partial correlations 



are equal to zero. Proposition 2.2 is just an algebraic reformulation of that 
standard characterization. 

A classic result of the graphical models literature is the characterization 
of precisely which conditional independence statements hold for all densities 
that belong to the graphical model. This characterization is determined by 
the d-separation criterion. 

Definition 2.3. Let A, B, and C be disjoint subsets of [m]. The set 
C directed separates or d-separates A and B if every path (not necessarily 
directed) in G connecting a vertex i £ A to a vertex j £ B contains a vertex 
k that is either 

1. a non-collider that belongs to C or 

2. a collider that does not belong to C and has no descendants that 
belong to C, 

where k is a collider if there exist two edges a — > k and b — ► k on the path 
and a non- collider otherwise. 

Theorem 2.4 (Conditional independence for directed graphical models). 
I?]/ A set C d-separates A and B in G if and only if the conditional indepen- 
dence statement Xa~U-Xb\Xc holds for every distribution in the graphical 
model associated to G. 



Combining Proposition 2.2 and Theorem 2.4 gives a characterization of 
when all the (#C + 1) x (#C + 1) minors of a submatrix Sauc,buc must 
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vanish. However, not every vanishing subdeterminant of a covariance matrix 
in a Gaussian graphical model comes from a d-separation criterion, as the 
following example illustrates. 



Example 2.5 (Choke point). Consider the graph in Figure 2.5 with five 
vertices and five edges. In this graph, the determinant | S13 45 1 = for any 




choice of model parameters. However, this vanishing rank condition does 
not follow from any single d-separation criterion/ conditional independence 
statement that is implied by the graph. □ 

Our main result is an explanation of where these extra vanishing deter- 
minants come from, for Gaussian directed graphical models. Before we give 
the precise explanation in terms of treks, we want to first explain how they 
enter the story. 

Definition 2.6. A trek in G from i to j is an ordered pair of directed 
paths (Pi,P2) where P\ has sink i, P2 has sink j, and both Pi and P2 
have the same source k. The common source k is called the top of the trek, 
denoted top(Pi,P2). Note that one or both of Pi and P2 may consist of a 
single vertex, i.e., a path with no edges. A trek (Pi, P2) is simple if the only 
common vertex among Pi and P2 is the common source top (Pi, P2). We let 
T(i,j) and S(i,j) denote the sets of all treks and all simple treks from i to 
j, respectively. 



Expanding the matrix product for £ in Proposition 2.1 gives the following 



trek rule for the covariance a 



'.r 



(1) 



(Pi,p 2 )er(i,j) 



where for each path P, X p is the path monomial of P, defined by 



X 1 



n hi- 

k^ieP 
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There is another rule for parameterizing the covariance matrices, which 
involves sums over only the set S(i,j) of simple treks. To describe this, we 
introduce an alternate parameter a% associated to each node % in the graph, 
and defined by the rule: 

ai = au= ^2 <Ptop(P u P2)^ Pl ^ P2 - 
(Pi,P 2 )eT(i,{) 

With the definition of the alternate parameter Oj , this leads to the parametriza- 
tion, called the simple trek rule: 

( 2 ) °y = atop^Pa^A^. 

(Pi,P 2 )e-S(i,i) 

The simple trek rule is also known as Wright's method of path analysis 
|14j . While we will depend most heavily on the trek rule in this paper, the 
simple trek rule also has its uses. In particular, the simple trek rule played 
an important role in the study of Gaussian tree models in |13j . 

The fact that treks arise in the expressions for aij suggests that any 
combinatorial rule for the vanishing of a determinant T,a,b should depend 
on treks in some way. This leads us to introduce the following separation 
criterion that involves treks. 

Definition 2.7. Let A, B, Ca, and Cb be four subsets of V(G) which 
need not be disjoint. We say that the pair (Ca,Cb) trek separates (or t- 
separates) A from B if for every trek (Pi, P2) from a vertex in A to a vertex 
in B, either Pi contains a vertex in Ca or Pi contains a vertex in Cb- 



Remark. The following facts follow immediately from Definition 2.7 



1. Since a trek may consist of a single vertex v, or more precisely a pair 
of paths with zero edges, we must have A n B C Ca U Cb whenever 
(Ca,Cb) t-separates A from B. 

2. The pair (Ca, Cb) t-separates A from B if and only if the pair (Cb, Ca) 
t-separates B from A. 

3. Each of the pairs (A, 0) and (0, B) always t-separate A from B, so we 
can always find a t-separating set of size min(#A, #B). Our results in 
this paper will show that t-separation gives nontrivial restrictions on 
the covariance matrix when ^Ca + #Cb < min(#A, #B). 

The combinatorial notion of t-separation allows us to give a complete 
characterization of when submatrices of the covariance matrix can drop rank. 
This is the main result for Gaussian directed graphical models; it will be 



proved in Section 3.1 
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Theorem 2.8. [Trek separation for directed graphical models] The sub- 
matrix T,a,b has rank less than or equal to r for all covariance matrices 
consistent with the graph G if and only if there exist subsets Ca, Cb C V(G) 
with #Ca + #Cb < r such that (Ca,Cb) t-separates A from B. Conse- 
quently, 

t)z(Y,a,b) < min{#C J 4 + #Cb '■ (Ca,Cb) t-separates A from B}, 
and equality holds for generic covariance matrices consistent with G. 

Here and throughout the paper, the term generic means that the condition 
holds on a dense open subset of the parameter space. Since rank conditions 
are algebraic, this means that the set where the inequality is strict is an 
algebraic subset of parameter space with positive codimension (see [2] for 
background on this algebraic terminology). 

Example 2.9. [Choke point, cont.] Returning to the graph from Ex- 



ample 2.5, we see that (0, {4}) t-separates {1,3} from {4,5}, which implies 
that the submatrix £13,45 has rank at most one for every matrix that be- 
longs to the model. Thus t-separation explains this extra vanishing minor 
that d- separation misses. 

Readers familiar with the tetrad representation theorem will recognize 
that {4} is a choke point between {1,3} and {4,5} in G. In particular, 
Theorem |2.8| includes the tetrad representation theorem as a special case. 

Corollary 2.10 (Tetrad Representation Theorem p2])- The tetrad 
o~ikO~ji — cruajk is zero for all covariance matrices consistent with the graph 
G if and only if there is a node c in the graph such that either ({c},0) or 
(0,{c}) t-separates {i,j} from {k, I}. 

Since conditional independence in a directed graphical model corresponds 
to the vanishing of sub determinants of the covariance matrix, the t-separation 
criterion can be used to characterize these conditional independence state- 
ments, as well. 

Theorem 2.11. The conditional independence statement Xa^-Xb\Xq 
holds for the graph G if and only if there is a partition Ca U Cb = C of C 
such that (Ca, Cb) t-separates A U C from B U C in G. 



Proof. The conditional independence statement holds for the graph G 
if and only if the submatrix of the covariance matrix T,auc,buc has rank 
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#C. By trek separation for directed graphical models, this holds if and only 
if there exists a pair of sets Da and Db, with #Da + #Db = #C such that 
(Da, D b ) t-separates AuC from BUC. Among the treks from AuC to BUC 
are the lone vertices c E C. Hence, C C Da^Db- Since #Da + #Db = #C, 
we must have Da U Db = C and these two sets form a partition of C. □ 



Theorem 2.11 immediately implies that d- separation is a special case of 
t-separation. Yanming Di |3 found a direct combinatorial proof of this fact 
after we made a preliminary version of this paper available. 

Corollary 2.12. A set C d-separates A and B in G if and only if there 
is a partition C = Ca^Cb such that (Ca, Cb) t-separates AuC from BUC. 

While t-separation includes d-separation, and the vanishing minors of 
conditional independence, as a special case, it also seems to capture some 
new vanishing minor conditions that do not follow from d-separation. The 
most interesting cases of this seem to occur when Ca nCg ^ 0. 



Example 2.13 (Spiders). Consider the graph in Figure 2.13, which we 
call a spider. 




Clearly, we have that ({c}, {c}) t-separates A from B, so that the subma- 
trix Yia,b has rank at most 2. Although this rank condition must be implied 
by CI rank constraints on S and the fact that S is positive definite, it does 
not appear to be easily derivable from these constraints. □ 
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2.2. Undirected Graphs. For Gaussian undirected graphical models, the 
allowable covariance matrices are specified by placing restrictions on the 
entries of the concentration matrix. In particular, let G be an undirected 
graph, with edge set E. We consider all covariance matrices £ such that 
(£-%• = for all i — j $ E(G). 

As in the case of directed acyclic graphs, it is known that conditional 
independence constraints characterize the possible probability distributions 
for positive densities [7]. Indeed, in the Gaussian case, the pairwise con- 
straints XiALXj\Xr m -\\uj\ for i — j £ E{G) characterize the distributions 
that belong to the model. As in the case of directed graphical models, gen- 
eral conditional independence constraints Xa-U-Xb\Xq are characterized by 
a separation criterion. 

If A, B, and C are three subsets of vertices of an undirected graph G, not 
necessarily disjoint, we say that C separates A and B if every path from a 
vertex in A to a vertex in B contains some vertex of C. 

Theorem 2.14 (Conditional Independence for Undirected Graphical Mod- 
els). For disjoint subsets A,B, and C C [m] the conditional indepen- 
dence statement Xa~U-Xb\ Xc holds for the graph G if and only if C sepa- 
rates A and B. 

Since conditional independence for normal random variables corresponds 
to the vanishing of the minors of submatrices of the form XUuc.buc it is 
natural to ask what conditions determine the vanishing of an arbitrary minor 
^A,B- We will show that the path separation criterion also characterizes the 
vanishing of arbitrary minors for the undirected graphical model. 

Theorem 2.15. The submatrix T,a,b has rank less than or equal to r for 
all covariance matrices consistent with the graph G if and only if there is a 
set C C V{G) with j^C < r such that C separates A and B. Consequently, 

rk (Y,a,b) < min{#C : C separates A and B}, 

and equality holds for generic covariance matrices consistent with G. 



Note that the sets A, B, and C need not be disjoint in Theorem 2.15 We 



will provide a proof of Theorem 2.15 in Section 3.2, using the combinatorial 
expansions of determinants. Unlike in the case of directed acyclic graphs, we 
do not find any new constraints that were not trivially implied by conditional 
independence. 
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2.3. Mixed Graphs. In this section, we describe our results for general 
classes of mixed graphs, that is, graphs that can involve directed edges i — * j, 
undirected edges i — j , and bi directed edges % *-* j. We assume that in our 
mixed graphs there is a partition of the vertices of the graph UUW = V(G), 
such that all undirected edges have their vertices in U, all bidirected edges 
have their vertices in W, and any directed edge with a vertex in U and a 
vertex in W must be of the form u —* w where u £ U and w £ W. With all 
of these assumptions on our mixed graph, we can order the vertices in such a 
way that all vertices in U come before the vertices in W and whenever i — > j 
is a directed edge, we have i < j. We assume that the subgraph on directed 
edges in acyclic. Note that we allow a pair of vertices to be connected by 
both a directed edge i — > j and a bidirected edge i <-»• j or undirected edge 
i — j. With this setup, both ancestral graphs [9] and chain graphs [I] occur 
as special cases. 

Now we introduce three matrices, which are determined by the three 
different types of edges in the graph. We first let A be the matrix with rows 
and columns indexed by V{G) which is defined by = 1, Ay = —Ay 
if i — ► j E E(G) and Ay = otherwise. Each Ay is a real parameter 
associated to a directed edge in G, though they no longer necessarily have 
the interpretation of regression coefficients. Next, we let K be a symmetric 
positive definite matrix, with rows and columns indexed by U, such that 
Kij = if i — j ^ E{G). Each entry i£y with i ^ j is a parameter associated 
to an undirected edge in G. Finally, we let $ = (</>y) be a symmetric positive 
definite matrix, with rows and columns indexed by W, such that </>y = if 
i <-> j E{G). Each <^y with i ^ j is a parameter associated to a bidirected 
edge in G. 

From the three matrices A, K and <£, defined as above, we obtain the 
covariance matrix of our mixed graphical model: 




Note that this representation parametrizes the Gaussian ancestral graph 
model in the case where G is an ancestral graph |9j, and chain graph models 
under the alternative Markov property [1] , when G is a chain graph. 

We use a path expansion in Section |3.3| to express this factorization as 
a power series of sums of paths, analogous to the polynomial expressions 



in terms of treks that appeared in the purely directed case in Section 2.1 



In the precise formulation given in Section 3.3 we will need the following 
generalized notion of a trek. 

A trek between vertices i and j in a mixed graph G is a triple (Pl, Pm, Pr) 
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of paths where 

1. Pl is a directed path of directed edges with sink i 

2. Pr is a directed path of directed edges with sink j, and 

3. Pm is either 

• a path consisting of zero or more undirected edges connecting the 
source of Pl to the source of Pr, or 

• a single bidirected edge connecting the source of Pl to the source 
of Pr. 

A trek (Pl, Pm, Pr) is called simple if each of Pl, Pm, and Pr is self- avoiding 
and the only vertices which appear in more than one of the segments Pl, 
Pm, and Pr are the sources of Pl and Pr. 

The set of all treks between i and j is denoted by T(i,j) and the set of all 
simple treks is S(i,j). Note that T(i,j) might be infinite, because we allow 
the path Pm to have cycles. On the other hand, S(i,j) is always finite. 

Definition 2.16. A triple of sets of vertices (Cl,Cm,Cr) t-separates 
A from B in the mixed graph G if for every simple trek (Pl, Pm, Pr) with 
the sink of Pl in A and the sink of Pr in B, we have that Pl contains a 
vertex in Cl, Pr contains a vertex in Cr, or Pm is an undirected path that 
contains a vertex in Cm- 

Note that the mixed graph definition of t-separation reduces to the di- 
rected acyclic graph version of t-separation when G is a DAG and reduces 
to ordinary graph separation when G is an undirected graph. 

Theorem 2.17 (t-separation for mixed graphs). The matrix Sa,_b has 
rank at most r for all covariance matrices consistent with the mixed graph G 
if and only if there exist three subsets Cl, Cm, Cr with #Cl+#Cm+#Cr < 
r such that (Cl,Cm,Cr) t-separates A from B. Consequently, 

rk(E A>B ) < mm{#C L +#C M +#CR : (C l ,C m ,Cr) t-separates A from B}. 

and equality holds for generic covariance matrices consistent with G. 

Since conditional independence statements for Gaussian graphical models 
correspond to special low rank submatrices of the covariance matrix, The- 
orem 2.17 also gives a characterization of when conditional independence 



statements for these mixed graph models hold. 

Corollary 2.18. The conditional independence statement X^ALXr\Xc 
holds for the Gaussian graphical model associated to the mixed graph G, if 
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and only if there is a partition C = Cx U Cm U Cr such that (Cl, Cm, Cr) 
t-separates A U C from B U C . 

Proof. The conditional independence statement holds if and only if 



^Auc,buC has rank #C. By Theorem 2.17 this happens if and only there 
exists (D L ,D M ,D R ) with #D L + #D M + #D R < #C that t-separate AuC 
and B U C. But since C C U -Da/ U -D/j, this occurs if and only if 
C = D L U D M U D fl is a partition of C. □ 

It is worth noting, however, that unlike in the case of directed acyclic 
graphs and undirected graphs, conditional independence statements and 
vanishing minors are not enough to characterize the covariance matrices 
that come from the model. See the Example in Section 8.3.1 of [9]. 

3. Proofs. In this section, we consider the elements Ay, (frij, and fcy as 
polynomial variables or indeterminates. When we speak about det ^a,B we 
mean to speak of this polynomial as an algebraic object without reference to 
its evaluation at specific values of Ay , </>y , and fcy . Thus the statement that 
det Yia,b is identically equal to zero means that the determinant is equal to 
the zero polynomial or power series. 

3.1. Proof of Theorem \2.8\ (directed graphs). Let G be a directed acyclic 
graph with vertex set V(G) = [m). We assign to each edge i — > j in G the 
parameter Ay- Let L be the m x m matrix given by = Ay if i — ► j is 
an edge in G and Ly = otherwise. Set A = I — L, where I is the m x m 
identity matrix. We assign to each vertex i G [m] the parameter fy, and let 
$ be the diagonal matrix $ = diag(^i, . . . , <p m ). 

The entries of the matrix A -1 have a well-known combinatorial interpre- 
tation in terms of the directed acyclic graph G. 

Proposition 3.1. For each path P in the directed acyclic graph G, set 
^ P = Uk^ieP X ki- Then 

where V(i,j) is the set of all directed paths from i to j. 

Lemma 3.2. Suppose that A, B C [m] with #A = j^B. Then det XU,_B is 
identically zero if and only if for every set S C [m] with #S = j^A = #B, 
either det(A~ 1 ) s ,A = or det(A" 1 )5 i B = 0. 
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PROOF. Since S = A~ T $A _1 , we have XU,b = (A _T ) J 4 ) [ m ]$(A _1 )[ m ] jB . 
We can calculate det T,a,b by applying the Cauchy-Binet determinant ex- 
pansion formula twice on this product. In particular, we obtain 

detS AjB = det(A- T ) Aii? det$ i?iS det(A- 1 ) 5ii j, 

R,SC[m] 

where the sum runs over subsets R and S of cardinality j^A = j^B. Since 
<]? is a diagonal matrix, det &r : s = unless R = S, in which case we let (j>s 
denote det $ s ,s = Uses <t>s- 

Thus, we have the following expansion of det £,4,6- 

detE A u = Y, det(A- T ) A , 5 det(A- 1 ) 5iB ^ 
SC[m] 

= det(A~ 1 ) 5iA det(A- 1 ) s ,ij(/>5 

SC[m] 

Since each monomial <ps appears in only one term in this expansion, the 
result follows. □ 



To prove the main theorem, we need two classical results from combina- 



torics. The first is Lemma 3.3 the Gessel-Viennot-Linstrom Lemma, which 



gives a combinatorial expression for expansions of subdeterminants of the 



matrix A . The second is Theorem 3.6, Menger's Theorem, which describes 



a relationship between non-intersecting path families and blocking sets in a 
graph. 

Lemma 3.3 (Gessel-Viennot-Lindstrom Lemma). J21 Suppose G is a 
directed acyclic graph with vertex set [m] . Let R and S be subsets of [m] with 
#R = #S = I. Then 

det(A- 1 ) fij5 = J2 i-if^' 

PeN(R,S) 

where N(R, S) is the set of all collections of non-intersecting systems of 
i directed paths in G from R to S, and (— l) p is the sign of the induced 
permutation of elements from R to S. In particular, det(A~ 1 )^ i s = if and 
only if every system of I directed paths from R to S has two paths which 
share a vertex. 



Consider a system T = {Ti, . . . ,T^} of £ treks from A to B, connecting 
t distinct vertices a» G A to I distinct vertices bj S B. Let top(T) denote 
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the multiset {top(Ti), . . . , top(X^)}. Note that T consists of two systems of 
directed paths, a path system from top(T) to A and a path system Pb 
from top(T) to B. We say that T has a sided intersection if two paths in 
Pa share a vertex or if two paths in share a vertex. 

Proposition 3.4. Let A and B be subsets of [m] with #A = #B. Then 

det T*a,b = 

if and only if every system of (simple) treks from A to B has a sided inter- 
section. 



Proof. Suppose that det T,a,b = 0, and let T be a trek system from A 



to B. If all elements of the multiset top(T) are distinct, then Lemma 3.2 
implies that either det(A _1 ) top ( T ) a = or det(A _1 ) top ( T ) jB = 0. If top(T) 
has repeated elements, then these determinants are zero, since there are 
repeated rows. Then Lemma |3.3| implies that there is an intersection in the 
path system from top(T) to A or in the path system from top(T) to B, 
which means that T has a sided intersection. 

Conversely, suppose that every trek system T from A to B has a sided 
intersection, and let R C [m] with #R = #A = #B. If R = top(T) for 
some trek system T from A to B, then either the path system from top(T) 
to A or the path system from top(T) to B has an intersection. If R is not 
the set of top elements for some trek system T, then there is no path system 
connecting R to A or there is no path system connecting R to B. In both 
cases, Lemma 



3.3 



implies that either det(A~ 1 )ji ) A = or det(A~ 1 )/j i s = 0. 
Lemma 3.2 then implies that det ^ab = 0. 

We note that it is sufficient to check the systems of simple treks. Given 
a trek T from i to j, let LE(T) denote the unique simple trek from i to j 
whose edge set is a subset of the edge set of T. Now, if each simple trek 
system T has a sided intersection, then every trek system does, namely the 
intersection coming from LE(T). □ 

We define a new DAG associated to G, denoted G, which has 2m vertices 
{1,2,..., m} U {1', 2', . . . , m'} and edges i — > j if i — > j is an edge in G, 
j' — > if z — ^ j is an edge in G, and i! — * i for each i £ [m] . 

Proposition 3.5. Treks in G from i to j are in bijective correspondence 
with directed paths from i' to j in G. Simple treks in G from i to j are in 
bijective correspondence with directed paths from i' to j in G that use at 
most one edge from any pair a — > b and b' — > d , where a, b, c £ [m]. 
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Proof. Every trek is the union of two paths with a common top. The 
part of the trek from the top to i corresponds to the subpath with only 
vertices in {1', ... , m'}, and the part of the trek from the top to j corresponds 
to the subpath with only vertices in {1, ... , m}. The unique edge of the form 
k' — > k corresponds to the top of the trek. Excluding pairs a — > b and b' — ► c', 
implies that a trek never visits the same vertex b twice. □ 

Menger's theorem (or, more generally, the Max-Flow-Min-Cut Theorem) 
now allows us to turn our sided crossing result on G into a blocking charac- 
terization on G. 

Theorem 3.6 (Vertex version of Menger's theorem). The cardinality 
of the largest set of vertex disjoint directed paths between two nonadjacent 
vertices u and v in a directed graph is equal to the cardinality of the smallest 
blocking set, where a blocking set is a set of vertices whose removal from the 
graph ensures that there is no directed path from u from v. 

Proof of Theorem 12.81 We first focus on the case where det T*a,b = 0, 
so that the rank is at most k — 1, where k = j^A = #B. According to 
Proposition |3.4| every system of k treks from A to B must have a sided 
intersection. That is, the number of vertex disjoint paths from A' to B is 
at most k — 1 in the graph G. We add two new vertices to G, one vertex u 
that points to each vertex in A' and one vertex v such that each vertex in 
B points to v. Thus, there are at most k — 1 vertex disjoint paths from u to 
v. Applying Menger's theorem, there is a blocking set W in G of cardinality 
k - 1 or less. Set Ca = {i G [m] : i! G W} and C B = {i G [m] : i G W}. 
Then it is clear that #Ca + < k — 1, and these two sets t-separate A 
from B. 

Conversely, suppose there exist sets Ca and Cb with #Ca + #Cb < k — 1 
which t-separate A from B. Then W = {i : i E Cb} U {i' : i G Ca} is a 
blocking set between u and v as above. Applying Menger's theorem, since 
j^W < k — 1, there is no vertex disjoint system of /c paths from A' to B. 
Thus, every trek system from A to B will have a sided intersection, so that 



det Ha b = by Proposition 3.4 



From the special case of determinants, we deduce the general result, be- 
cause if the smallest blocking set has size r, there exists a collection of r 
disjoint paths between any subset of A and any subset of B, and this is 
the largest possible number of paths in such a collection. This means that 
all (r + 1) x (r + 1) minors of T,a,b are zero, but at least one r x r mi- 
nor is not zero. Hence T,a,b has rank r for generic choices of the A and 4> 
parameters. □ 
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3.2. Proof of Theorem 2.15 (undirected graphs). To prove Theorem 2.15 
we will introduce Lemma |3.7| a limited analogue of the Gessel-Viennot- 
Lindstrom Lemma for graphs which are not necessarily acyclic. This version 
is a direct corollary of Theorem 6.1 in [5], which, for the sake of simplicity, 
we do not state in full generality. 

Let G be a directed graph, not necessarily acyclic. Let W be the matrix 
given by W%j = Wij if i — > j is an edge in G and Wij = otherwise. 
By standard notions in algebraic graph theory, we can expand the matrix 
(I — TV) -1 as a formal power series in terms of the Wij. In particular, 

where V(i,j) is the set of all (possibly infinitely many) paths from i to j in 
G. This is just Proposition |3.1| in the general case. 

Let A = {ai, . . . , ae} and B = {&i, . . . , b{\ be subsets of [m] with the same 
cardinality. The determinant det((J — W)~ 1 )a,b can be written simply in 
an expression that involves cancellation as: 

t 

(3) det((I-W)- 1 ) AB = sign(r)n^ P! - 

reSt,PieV(ai,b T(i) ) i=l 

Deciding whether this formula is nonzero amounts to showing whether or 
not all terms cancel in this formula. This leads to the following version of 
the Gessel-Viennot-Lindstrom Lemma [5]. 

Lemma 3.7. Let G be a directed graph. Let A = {a±, . . . , a^} and B = 
{&i, . . . , bf\ be subsets of[m] with the same cardinality. Then (det(J— W)~ 1 )a,b 
is identically zero if and only if every system of I directed paths from A to 
B has two paths which share a vertex. Further, if there is a set of £ paths 
P\ , . . . , P(_ from A to B which do not have a common vertex, then w Pl ■ ■ ■ w Pe 
appears as a monomial with nonzero coefficient in the power series expansion 
ofdet((I-W)- l ) AB . 

For an undirected graph G, we associate to each edge i — j in G a param- 
eter ipij. Then let = ipij if i — j is an edge in G and = otherwise. 
Let G be the directed graph formed by replacing each undirected edge in G 
with two directed edges of weight ipij, one in each direction. 

Corollary 3.8. For this symmetric matrix ^, the determinant 
det((J— x &)~ 1 )a,b is identically zero if and only if every system of £ = j^A = 
j^B directed paths from A to B in G has two paths which share a vertex. 
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Proof. Lemma 3.7 immediately implies that if every system of directed 
paths in G has a crossing, then det((J — ^)~ 1 )a,b is identically zero, by 
specialization. 

To show the converse, we need to verify that, for a fixed A and B, each 
system P consisting of self- avoiding paths, no two of which intersect, is the 
unique system of its weig ht tp F . While G may have multiple path systems of 
the same weight '0 P j they must all consist of the same undirected edges in G, 
and any such system in G can be obtained from any other by switching the 
directions of some of the paths. Then, since no two of the paths intersect, we 
see that there is only one such system with the correct orientation of paths, 
since A and B are fixed. □ 

Proof of Theorem I27L51 We write E = K' 1 = D- X {I - ^^D" 1 , 
where D is the diagonal matrix of standard deviations: D = diag(^/o"n, . . . , 
\J&mm)- We can treat the entries = kij • ^/oaOjj as free parameters. 
It suffices to prove a vanishing determinant condition locally near a single 
point in the parametrization, so we assume that \& is small so that we can 
use the power series expansion: (I — \I r ) _1 = / + ^ + ^ 2 + ^ 3 + • • • . Applying 
Cauchy-Binet as before, we obtain 

detS AjB = Yu det( J D- 1 ) A ,j i det((J-*)- 1 ) Ji! 5det( J D- 1 ) 5)B 

R,SC[m] 

= det(D- l ) AA det((I - *)- 1 )a,b det(ZT 
since detp" 1 )^ = if A / R and de^-D" 1 )^ = if B ^ S. Now, 



det(L»- 1 ) A ,A + and det( J D" 1 )b,b 7^ 0, and Corollary 3.8 completes the 



proof. □ 



3.3. Proof of Theorem 2.17 (mixed graphs). Recall that covariance ma- 



trices consistent with a mixed graph G all have the form 




Our first step is a standard argument in the graphical models literature, 
which allows us to reduce to the case where there are no bidirected edges in 
the graph. This can be achieved by subdividing the bidirected edges; that 
is, for each bidirected edge i *-+ j in the graph, where i < j, we replace 
i <-> j with a vertex Vij, directed edges Vij — > i and Vij — > j. The graph 
G obtained from G by subdividing all of its bidirected edges is called the 
bidirected subdivision of G. If G has only directed and bidirected edges, then 
G is called the canonical DAG associated to G. 
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Proposition 3.9. Let A,B C V(G) be two sets of vertices such that 
#A = #B. 

1. The generic rank of T,a,b is the same for matrices compatible with G 
orG. 

2. There exists a triple (Cl, Cm, Cr) with #Cl + #Cm + #Cr = r that 
t-separates A from B in G if and only if there is a triple (Dl, Dm, Dr) 
with #Dl + #Dm + #Dr = r that t-separates A from B in G. 

Proof. (1) It suffices to prove that the two parametrizations have the 
same Zariski closure (see (2] for the definition and background). This will 
follow by showing that near the identity matrix, the two parameterizations 
give the same family of matrices. Locally near the identity matrix, the matrix 
expansion for £ can be expanded as a formal power series in the entries of 
K, <E>, and A. The expansion for a%j can be expressed as a sum over all treks 
T(i,j) between i and j in G. This follows by using the matrix expansions 



3.2 



for paths in A 1 and K 1 as we have used in Sections 3.1 and 

Similarly, the expansion for cr^- is the sum over all treks in G. Now set 

<f>ij = ^j^jK^/X^j and 4>u = 4>u + 

j'<->» 

This transformation shows that these two parametrizations have the same 
Zariski closure, since they yield the same formula via sums over the treks in 
G and G, respectively. The point is that since we assume that we are close 
to the identity matrix, it is also possible to go back and forth between G 
and G parameters. In particular, since we are close to the identity matrix, 
4>ij is small. So we can choose <f> Vi . jVi . = e > and set X Vi . j 



and \ ViJ j = sign(</>jj)y |</>jj|e. The small size of the <j>ij guarantee that we 
can find a positive <f>a satisfying the second equation. The smallness of e 
guarantees that $ is positive definite. 

(2) Any t-separating set in G is clearly a t-separating set in G. Suppose 
that (Dl, Dm, Dr) is a minimal t-separating set in G; that is, if any vertex 
is deleted from (Dl, Dm,Dr) we no longer have a t-separating set. It is easy 
to see that Dm will not contain any vertices Vij in a minimal t-separating 
set of G, so that Dm Cl V(G). It clearly suffices to show that each minimal 
t-separating set in G is a t-separating set in G. We define 

C L = (D L nV(G))U{i:v iij eD L }, 
Cm = Dm, 

Cr = (Dr n V(G)) U {j : v id G Dr}. 



TREK SEPARATION 



19 



If our t-separating set in G contains none of the vertices Vij, then it is 
clearly a t-separating set in G; otherwise, the way that % and j are chosen in 
{i : Vij £ Dl} and {j : Vij € Dr} is important. Given a vertex Vij in the 
t-separating set, let T(vij) denote the set of treks T = (Tl, Tm, Tr) from A 
to B such that TlPiDl = {vij} or TrHDr = {vij}. Since (Dl, Dm, Dr) is 
minimal, we see that T(vij) must be nonempty. This implies that in every 
trek T = (Tl, Tm,Tr) £ T(v{j), up to relabeling, i occurs in Tl, whose sink 
lies in A, and j occurs in Tr, whose sink lies in B. For if there were a trek 
from A to B in T(vij) that had j in Tl or i in Tr, we could patch two halves 
of these treks together to find a trek from A to B that did not have a sided 
intersection with (Dl, Dm, Dr). If i lies in Tl an d j lies in Tr m such treks, 
then we add i to Cl when Vij £ Dl, and we add j to Cr when Vij G Dr. 
Then the triple (C L , C M , Cr) has #Cl+#C m +#Cr < #D L +#D M +#D R 
and also t-separates A from 5. □ 

Remark. The parameterization using the bidirected subdivision G typ- 
ically yields a smaller set of covariance matrices than the original graph G. 
However, these sets have the same dimension and the same Zariski closure. 

Before getting to the general case of mixed graphs, we first need to handle 
the special case of mixed graphs that do not have undirected edges. 

Lemma 3.10. Suppose that G is a mixed graph without undirected edges. 
The matrix T*a,b has rank at most r for all covariance matrices consistent 
with the mixed graph G if and only if there exist subsets Cl,Cr C V(G) 
with #Cl + #Cr < r such that (Cl, 0, Cr) t-separates A from B. 



Proof. Due to Proposition 3.9 this immediately reduces to the case of 



directed acyclic graphs, so that we may apply Theorem 2.8 □ 



Now that we have removed the bidirected edges, we assume that our 
matrix factorization has the following form: 

and we prepare to apply the Cauchy-Binet determinant expansion formula. 
That is, for two subsets A, B C [m], with j^A = #B, we have 

(4) det S AjB = E E det(A- T ) AS • det(if- 1 ) 5)T ■ de^A" 1 )^ 

SC[m] TC[n] 



where the sums range over the sets 5, T C [m] with #S = #T = j^A = #B. 
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We say that a set of treks {(P^, Pmh -PrJ : * £ Ml has a sided-crossing if 
there are indices ii 7^ 22 £ [£] such that either P^ and P^ share a vertex, 
Pm 4i and PM i2 share a vertex, or Pr. and Pr share a vertex. 

Lemma 3.11. Let j^A = #P = r. Suppose that every system of r treks 
from A to B in a mixed graph G (consisting of directed and undirected edges) 
has a sided crossing. Then for every S, T C V(G) with #S = #T = r, we 
have det(A~ T ) A ,5 • de^iT -1 )^ • det(A- 1 ) Tji? = 0. 

Proof. Consider the trek systems from A to B that consist of a directed 
path system from S to A, an undirected path system Pjv/ from S to T, 
and a directed path system Pr from T to P. We call such a system of treks 
an (S, T)-trek system from A to B. 

We claim that if every trek system from A to B has a sided crossing, then 
either all (S, T)-trek systems have a crossing in P^, all (S, T)-trek systems 
have a crossing in Pj\,/, or all (S, T)-trek systems have a crossing in P/?. 
Suppose this is not the case; then there is a directed path system from S 
to A with no crossing, an undirected path system from S to T with no 
crossing, and a directed path system from T to P with no crossing, yielding 
an (S, T)-trek system from A to B with no sided crossing. 

Applying the claim, along with the directed and undirected versions of 



the Gessel-Viennot-Lindstrom Lemma (Lemma |3.3| and Corollary 3.8), we 
deduce that one of det(A _T ) J 4 i 5, det^ -1 )^-]", or det(A~ l )T t B is identically 
zero. This implies that their product is zero. □ 



Lemma 3.11 is enough to handle one direction of Theorem 2.17. For the 
other direction, we need slightly more machinery. Using our presentation for 
undirected graphs, we can write 

R- 1 = D _1 (J — W)~ 1 D~~ l 

where D is the diagonal matrix of standard deviations and W%j = Wij = Wn 
if i — j G E(G), and Wij = otherwise. Thus, 

E = A- T D-\l - W)- x D- l k- x . 

Using the standard argument of algebraic graph theory, we can expand this 
near the identity matrix as a power series 

E XPLd s(P L ) wPMd s(P R ) XPR > 
(PL,P M ,P R )&T(i,j) 
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where s(P) denotes the source of the directed path P. Thus if A = {a±, . . . , af\ 
and B = {b\, . . . , bf\ 

(5) 

i 

det S A>B = £ sign(r) J] X^d^w^d^X^ . 

reSf,(p t ,.,p A f i ^PR i )er(a i ,6 T(i) ) »=i 

Lemma 3.12. Suppose that there exists a system of treks from A = 
{ai, . . . , a^} to B = {bi, . . . , 6^} without sided crossing. Then det £a,b &s 
not zero. 

Proof. If such a system of treks exists, then there also exists ar £ S{ 
and a system of simple treks Tj = (Pi^, Pm^ PrJ G <S( a i; &r(i))> i = 1, • • ■ ,^ 
without sided intersection. Let G' be the graph obtained from G by deleting 
all edges that do not appear in any of the T,. The matrix obtained from 
^>A,B by setting all parameters corresponding to edges outside G' equal to 
zero is exactly the determinant of the corresponding matrix T,' AB for G'; it 
suffices to show that this latter determinant is non-zero. 

To do this, we construct a third graph G" from G' by introducing, for each 
i for which P^ is not empty, a bidirected edge s(PlJ <-> s(PrJ with label 
0s(Pl ),s(p A /.)i an d deleting all undirected edges. By Lemma 3.10 we have 
det Tj" a b ^ 0. But then this determinant remains non-zero after specialising 
the parameters 4> s (P L .),s(P M .) to the monomials d~^p L ^w PhI id~^ PR y, here we 
use that, as the PMi are disjoint, these £ monomials contain disjoint sets 
of variables. The resulting non-zero expression is the subsum of the G'- 
analogue of (5) over all terms for which the H^-part of the monomial equals 
Y\i = i(w PM t) ei for some exponents e±, . . . , eg £ {0, 1}. Indeed, if a system of 
treks (T! = (P' L ., P' M ., P' R .))i from A to B in G' has Y[Ui(w Pm ^ as the 
VF-part of its monomial, then since the Pj are self-avoiding and mutually 
disjoint, the non-empty middle parts P' M _ form the subset of the non-empty 
PMi for which ej equals 1 (potentially up to traversing some of these paths 
in the opposite direction). Hence the trek monomial of (T{, . .. ,Ti) comes, 
under the specialisation above, from the monomial of a unique trek in G" 
of the same sign. This proves that det T,' A B is non-zero, whence the lemma 
follows. □ 

Proof of Theorem 12.171 By Proposition [379] we can assume that there 
are no bidirected edges in G. It suffices to handle the case where j^A = 



zftB = r + 1. Lemmas 3.11 and 3.12 imply that det T,a,b = if and only if 



every system of t treks from A to B has a sided intersection. We wish to 
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apply Menger's theorem. To do this, we introduce a new graph G with 3m. 
vertices, namely {1, . . . , m}U{l', . . . , m'}U{l", . . . , to"}. This is analogous to 
our previous definitions of G, but accounts for both directed and undirected 
edges. The edge set of G consists of precisely those edges 

• i —* j and j' — * i', where i — > j is a directed edge of G, 

• i" — > j" and j" — * i" , where i — j is an undirected edge of G, and 

• i' — > i" and i" — > i, where i £ [to] is a vertex of G. 

Treks between i and j in G are in bijective correspondence with directed 
paths between %' and j in G. Thus, the vertex version of Menger's theorem 
implies that there must exist C' L C {1', . . . ,to/}, C' m C {1", . . . ,m"}, and 
Cr C {1, . . . ,to} such that every path from A' to B in G" intersects one 
of these sets, and such that #C' L + #G^ 7 + #Cr < r. But then the triple 
(Cl,Cm,Cr) i-separates A from B in G, where Cl = {c : c' £ C' L } and 
C M = {c : c" G G^}. ' □ 

4. Conclusions and Open Problems. We have shown that the t- 
separation criterion can be used to characterize vanishing determinants of 
the covariance matrix in Gaussian directed and undirected graphical models 
and mixed graph models. These results have potential uses in inferential 
procedures with Gaussian graphical models, generalizing procedures based 
on the tetrad constraints [10J in directed graphical models. The tetrad con- 
straints are the special case of 2 x 2 determinants. Both referees have pointed 
out that these results also extend to graphical models with cycles, by appli- 
cations of the more general version of the Gessel-Viennot-Lindstrom lemma 
for general graphs [5] . We have focused on the case of directed acyclic graphs 
because these are the most familiar in the graphical models literature. 

Our results suggest a number of different research directions. For exam- 
ple, for which mixed graphs is it true that vanishing low rank submatrices 
characterize the distributions that belong to the model? This is known to 
hold for both acyclic directed graphs and undirected graphs, but can fail in 
general mixed graphs. 

Another open problem is to determine what significance the t-separation 
criterion has for graphical models with not necessarily normal random vari- 
ables, in particular, for discrete variables. It would be worthwhile to deter- 
mine whether t-separation can be translated into constraints on probability 
densities for graphical models with more general random variables. 
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